Estimating generation time of SARS-CoV-2 variants in Italy from the daily incidence rate

The identification of the transmission parameters of a virus is fundamental to identify the optimal public health strategy. These parameters can present significant changes over time caused by genetic mutations or viral recombination, making their continuous monitoring fundamental. Here we present a method, suitable for this task, which uses as unique information the daily number of reported cases. The method is based on a time since infection model where transmission parameters are obtained by means of an efficient maximization procedure of the likelihood. Applying the method to SARS-CoV-2 data in Italy, we find an average generation time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{z}}=3.2 \pm 0.8$$\end{document}z¯=3.2±0.8 days, during the temporal window when the majority of infections can be attributed to the Omicron variants. At the same time we find a significantly larger value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\overline{z}}=6.2\pm 1.1$$\end{document}z¯=6.2±1.1 days, in the temporal window when spreading was dominated by the Delta variant. We are also able to show that the presence of the Omicron variant, characterized by a shorter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\overline{z}}}$$\end{document}z¯, was already detectable in the first weeks of December 2021, in full agreement with results provided by sequences of SARS-CoV-2 genomes reported in national databases. Our results therefore show that the novel approach can indicate the existence of virus variants, resulting particularly useful in situations when information about genomic sequencing is not yet available. At the same time, we find that the standard deviation of the generation time does not significantly change among variants.

In Fig. 2 of the main text, we present the dependence of LL best (z, σ) on the optimal series {R c } and {µ} for different choices of the parameters a and τ in the Gamma distributed w(z).
In this section, we perform the same study but assume that w(z) follows either a log-normal distribution (w ). The dependence of both distributions on two parameters, µ and λ, can be fully expressed in terms of the average value z and standard deviation σ. Specifically, for the log-normal distribution, we have z = e µ+λ 2 /2 and σ = z e λ 2 − 1 . For the Weibull distribution, we have z = λΓ(1 + 1/µ) and σ = λ Γ(1 + 2/µ) − Γ(1 + 1/µ) 2 .
We plot the optimal LL best (z, σ) as a function of z for different λ values in Fig. Suppl.3 and Fig. Suppl.4 for the log-normal and Weibull distributions, respectively. For a better comparison, we also include the data from Fig. 2 of the main text for LL best (z, σ) in the case of a Gamma distributed w(z) for the smallest values of τ .
In the case of the log-normal distribution (Fig.Suppl.3), LL best (z, σ) as a function of z is very similar to the one obtained for the Gamma distribution. Specifically, we find that the maximum value of LL best (z, σ) corresponds to the same values of z and σ for both the Gamma and log-normal distributions.
For the Weibull distribution (Fig. Suppl.4), the comparison is less direct since the average value z is non-monotonic with µ for fixed λ. Nevertheless, we still find that LL best (z, σ) provides the same value of z obtained for the Gamma and log-normal distributed w(z).
Based on these results, we can conclude that there is no significant difference between a Gamma, log-normal, or Weibull distributed w(z). All of them lead to very similar optimal values for z and σ.

THE INFLUENCE OF UNREPORTED INFECTED ON z
The daily incidence rate is clearly correlated to the number of performed tests. Indeed, the higher the number of performed tests, the larger the probability to identify an asymptomatic infected individual. In [1], we have developed a procedure to disentangle the daily number of infected I from the daily test number n T . It is based on the assumption that the number of identified infected individuals during the m-th day can be viewed as the sum of two contributions I(m) = I (φ) (m) + I ran (m). Here, I ran (m) represents asymptomatic individuals who are identified as infected on the m-th day, substantially by chance, according to a random search within a population N P .
Indicating with I T OT (m) the total number of new infected individuals during the m-th day, and taking into account that the search is not fully random but it is usually focused on a subset N P φ 1 of the total population, we have I ran (m) = n T (m)I T OT (m) φ1N P , with φ 1 < 1. The quantity I (φ) (m), on the other hand, includes all infected individuals with symptoms and all individuals who have been in strict contact with them. It is reasonable to assume that these individuals are always tested, and therefore their identified infection is not related to the daily number of performed tests. We define it as the "disentangled" incidence rate since we expect that its value does not depend on n T (m). Assuming that I (φ) (m) is a fixed fraction φ 2 < 1 of the total number of infected individuals, I (φ) (m) = φ 2 I T OT (m), we obtain I(m) = I T OT (m)φ 2 + I T OT (m) n T (m) φ1N P , and therefore the disentangled incidence daily rate I (φ) (m) can be written as where φ = φ 1 φ 2 is a parameter. The value of φ can be fixed by imposing that {I (φ) } is not causally related to {n T }. This procedure leads [1] to φ = 5 × 10 −4 , which is the value used in our analysis. More precisely, we apply our algorithm for the evaluation of the log-likelihood, considering, instead of the reported incidence rate I(m), the disentangled one I (φ) (m) defined in Eq. (1), and we repeat the same analysis as in Fig.2 of the main text. Results (Fig.Suppl.6) substantially provide the same result for z as those of Fig.2 of the main text, in each of the four temporal windows. This indicates that our findings are quite stable with respect to the number of asymptomatic undetected people.
FIG. Suppl.6: The log-likelihood LL best (z, σ), evaluated for the temporal profile of R c (m) which maximizes the likelihood for the disentangled incidence rate in Lombardy, is plotted as a function of z = aτ . The four different panels correspond to the four temporal windows:

ANALYSIS FOR OTHER ITALIAN REGIONS
In this section, we present the same analysis performed in the main text for the region Lombardy for five other regions with the largest number of inhabitants, namely Lazio, Campania, Veneto, and Emilia-Romagna.